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Abstract. The work presented in this paper is part of the cooperative 
research project AUTO-OPT carried out by twelve partners from the au- 
tomotive industries. One major work package concerns the application of 
data mining methods in the area of automotive design. Suitable methods 
for data preparation and data analysis are developed. The objective of 
the work is the re-use of data stored in the crash-simulation department 
at BMW in order to gain deeper insight into the interrelations between 
the geometric variations of the car during its design and its performance 
in crash testing. In this paper a method for data analysis of finite element 
models and results from crash simulation is proposed and application to 
recent data from the industrial partner BMW is demonstrated. All nec- 
essary steps from data pre-processing to re-integration into the working 
environment of the engineer are covered. 

1 Introduction 

The objective of the data mining work presented in this paper is the re-use of 
data stored in the crash-simulation department at BMW in order to gain deeper 
insight into the interrelations. Here the objective is to find hidden knowledge in 
stored data. In principle one could think of various possible questions for such a 
knowledge mining analysis: 

— which innovations have evolved during the design process 

— were certain steps in the development unnecessary or could they be shortened 

— is it possible to extract analogies between different car projects 

— can reasons that have lead to certain design decisions be reproduced 

— can this reasoning be applied to future projects 

The data mining project in AUTO-OPT aims at examining the applicability of 
data mining methods on crash simulation data [1]. Due to the fact that design 
and development knowledge is the major asset of engineering, an automotive 
company cannot be expected to share large amounts of their data for research 
reasons. On the other hand, interesting results from data mining can only be 
achieved from interesting data. Therefore in this work the applicability of the 
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method is demonstrated, its value cannot be evaluated on the data basis avail- 
able. This will be aimed at in future work. 

The crash department at BMW stores all relevant information in the sim- 
ulation data management system CAE-Bench [2]. Data mining queries are to 
be submitted from this environment. Results have to be brought back into this 
system and assigned to the underlying models, i.e. stored within their audit trail. 
This procedure is schematically shown in Figure 1. 



Report \ Pmject |5lorybaard 



Tasks | 



working environment of engineer 
models, simulations, results, reports 



"I 



Disassembly! 



Itipul Deck (I) 



l.«u>(1) | 7 4^'] |v.<u. (1> 



I:,. 



Proj&ct (0) 



"1^ S 



m scai 

DISASSEMBLER 



data preparation 



name of 
rnulation mode 



parts and 

meta data 



results 



data reduction 

similarity measure, 
nodificaition analysis/ 



subset 
evaluation 



working environment for data mining 



DM data base 

relevant models 



reduced 
rneta data 



reduced 
results 



data mining 
data minin 8 reports 

algorit hms 



Weka 



Fig. 1. Procedure for data mining. Models and crash results are exported from CAE- 
Bench, models are disassembled into parts, geometry based meta data are calculated, 
parts and meta data are stored within SCAI-DM, crash results are attached to meta 
data, data mining tables are assembled, DM analysis is performed, result files are 
produced and exported. 



Fraunhofer SCAI has been provided with data from one of the most recent 
car projects at BMW. The vehicle under development is shown in Figure 2 
(left). Each data set describes one stage of construction of this car within the 
development process via finite element models (FE models) made up by about 
500.000 independent nodes and elements. 

Each car is composed of app. 1200 parts. CAE-Bench stores the models as 
complete vehicles, i.e. one single large FE model, called input deck. In order to 
analyse the geometry of the parts these input decks need to be disassembled as 
shown in Fig. 2 (right) for an older BMW vehicle, the new model cannot be 
shown in such detail because of a nondisclosure agreement. 



Fig. 2. Recent model of BMW employed for data mining (left). One FE input deck 
consisting of numerous parts (right). 

2 Preparation of the data for data mining 

It is generally accepted that the preparation of the data involves as much as 
80-90% of the effort when a data mining task is attempted, see e.g. [3]. The 
data cannot be processed by a data mining tool in their original format. To the 
authors best knowledge no approach for data mining on raw finite element data 
exists. The preparation of the data thus constitutes the main challenge for the 
data mining approach on the FE data. In addition, the data has to be cleaned 
and checked for consistency and the appropriate values have to be combined. As 
a first but major step a process for data preparation has been developed: 

a) Export of data from CAE-Bench 

b) Disassembling into parts and computation of meta data 

c) Data cleaning and sorting — clustering of parts 

d) Similarity analysis and data reduction — clustering of variants 

e) Evaluation and cleaning of crash result data 

As a result of this procedure a table is generated that allows for access to the 
data with data mining algorithms. This section focuses on the preparation of the 
data, whereas the application of the data mining algorithms will be presented in 
section 3. 

a) Export of data from CAE-Bench 

CAE-Bench can export selected input decks along with the result achieved when 
these models were subjected to a virtual crash test. An example is shown in 
Fig. 4. Information is extracted from this export, such that the relevant crash 
results can be attached to the respective input deck data and stored in the SCAI 
data mining framework (SCAI-DM). 

b) Disassembling into parts and computation of meta data 

Motivation. The data mining approach in this work concerns the shape of the 
parts of the car. The aim is to analyse how changes in shape have influenced crash 
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behaviour. The FE-model itself contains all geometrical information. However, 
this information is hidden from data mining algorithms, as these cannot extract 
meaningful knowledge from node and element descriptions. Meta data has to be 
determined such that it quantifies geometry in an appropriate manner. In this 
work several values have been chosen as meta data, e.g. the centre of gravity 
of each part, the moments of inertia, the length of edges and margins, surface 
size, bounding box, length of branching lines — as shown in Figure 3. All of these 
are mesh independent. They thus enable comparison between models that have 
been meshed with different algorithms or programs. The meta data reduce the 
amount of data massively, such that handling of data is facilitated considerably. 




Fig. 3. Typical meta data and their appearance in an example part. 



Reading the Input Deck. Today the body-shell of a finite element car model 
is described by an input deck of 100 MB containing approximately 1.500.000 
lines. Figure 4 shows a small subset of such an input deck. The part number 
indicates which element of the meshes belongs to which specific part of the car. 
The material section defines a homogeneous density and thickness for each part. 
In the disassembling procedure all elements with the same part number and their 
respective nodes are extracted from the input deck and form one new mesh for 
this single part. 

For each part the disassembler thus extracts a sub-mesh of the input deck. 
This sub-mesh is the basis of the calculation of the meta data. The sub-mesh 
files are also used to create previews of the parts: from three different angles or — 
on demand — in form of a three-dimensional applet visualisation [4-6] . Since the 
generation of previews of parts is a time intensive process, it is initiated only for 
new parts which have not previously been stored in the database. 

Computation of Meta Data. For meta data calculation various details on 
FE models have to be taken into account. The model surfaces here are curved. 
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Fig. 4. Excerpt of an input deck: the NODE and SHELL sections describe the geometry 
of the finite element meshes; in the MATER section the thickness d and density p of the 
material can be found. Beside shell elements, which can be triangles or quadrilaterals, 
additional elements like membrane elements (with 4 nodes), solid (8 nodes), beams (2 
nodes) , or bars (3 nodes) appear in an input deck. 



Using shell elements for the description means that the four corner points of a 
quadrilateral do not necessarily lie in one common plain, see Fig 3. One well 
defined way to calculate their surface is S « \ [(a + c) x (b + d)]. The mass of 
an element is given by its surface multiplied by the material thickness d and 
density p. The centre of gravity, at which the mass m is assumed to be located 
in a single element is positioned approximately at j [A + B + C + D] , where A 
. . . D are the corner points of the SHELL element. Then the centre of gravity and 
the moments of inertia of the complete part are given by their sum over all point 
masses. For every element a normal vector is constructed by n = jfipfj x \b+d\ 
The normal vectors ni and n 2 of adjacent elements are used for detecting edges. 
If the angle a = 2arcsin(|ri2 — n±\/2) is larger than a user defined value, the 
connection line between the elements is called an edge of the mesh. If one side 
of an element is not connected to any further element, this line is assumed to 
be a margin of the structure. In this manner all meta data characterising the 
geometry of each part is computed. 



c) Data cleaning and sorting — clustering of parts 

The finite element models are subject to numerous kinds of modifications. During 
the engineering process in which a car model is improved with respect to its 
crash-worthiness a subset of parts is modified. In general the parts modified are 
the crash-relevant ones. Additional modifications follow the demands of other 
engineering disciplines, e.g. holes may be inserted into the parts in order to 
achieve a better drain of varnish during production. Such measures reduce crash- 
worthiness, which then again has to be improved by further modifications. 
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Fig. 5. Screenshot of SCAI-DM framework, here some variations of part no. 11171 
with meta data and checksums. The database contains about 146.000 parts belonging 
to 134 different crash tests. Deleting all sub-mesh files with multiple MD5 checksums 
(right column) the database can be reduced by 93% to 9900 different parts. 

However, not all parts are modified in all stages of car design. As unchanged 
parts cannot be responsible for deviations in the simulation results, such parts 
can be excluded from the analysis. In order to remove the parts that never have 
been modified MD5 checksums are created for all sub-mesh files, Fig. 5 (right 
column) . If the checksum of any part stays constant in any data set of interest 
the part was left unchanged and one single reference of the sub-mesh file is 
stored. Solely parts with more than one instance in the data base are included 
in the data mining queries. 

SCAI DM Data Base. After disassembling all parts are stored in the SCAI- 
DM framework along with their meta data, as shown in Fig. 5. Depending on 
the purpose parts and data can be displayed in any other combination. 

Avoiding Inconsistent Naming/Numbering. One bottle neck for data min- 
ing of the BMW data is the fact that text entries in the data management system 
are free text. Some agreements are complied with in the majority of cases. Re- 
peatedly, however, re-naming and re-numbering of parts was encountered in 
the data, which showed that rules were not consequently followed. Therefore, to 
avoid irrelevant results from the analysis aimed at it is vital that all data entering 
the analysis stick to the same rules. The safest way to achieve correct data is to 
avoid the text entries in CAE-Bench altogether and use the FE descriptions as 
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a basis. This again implies that an automatic method to identify parts has to be 
set up such that the use of part-numbers or -names coming from CAE-Bench is 
avoided. The meta data calculated from the FE model can be the basis for part 
identification using cluster analysis. The clustering process divides a dataset into 
mutually exclusive groups such that the members of each group are as "close" 
as possible to one another, and different groups are as "far" as possible from one 
another, where distance is measured with respect to all available variables, see 
e.g. [7]. 

Each meta data property spans a new dimension in the similarity space. 
The meta data of a specific part are represented by a point in the multidimen- 
sional similarity space. Similar parts have similar meta data and form a cloud 
of adjacent points. Figure 6 (left) shows the idea of clustering of meta data in a 
schematic diagram: Two dimensions of the similarity space are shown. The meta 
data of the parts L and C form two clouds, in which a substructure indicates 
the presence of several modifications. Figure 6 (right) shows a clustering plot 
of the BMW data. The dots describe to two different parts with the same part 
number 11011 "Motortrager" (three clusters on the right) and "Schottblech Mo- 
tortrager" (small cluster in the centre of the diagram). This is an example for a 
change in numbering of a part. 



d) Similarity analysis and data reduction — clustering of variants 

Differing checksums indicate that a sub-mesh was modified in some unknown 
way. Then negligible file modifications have to be distinguished from relevant 
changes such as modified shapes. In this framework for data mining the geomet- 
ric meta data, as described above, serves as a similarity measure for the parts. 
Minor and major changes of the parts design will result in a hierarchical struc- 
ture of clouds and sub-clouds, see Fig. 6 (left). Using hierarchical clustering a 
substructure can be found inside the clusters. Starting with Ci as a reference, 
C\ contains parts with a higher mass (caused by a higher thickness d or density 
p of the material) while the parts in C3 result from geometrical modifications 
increasing the surface (e.g. caused by additional headings for higher stiffness). 
In the clustering plot of BMW data, Fig. 6 (right), the light grey dots belong to 
three modifications of the same part, namely 11011 "Motortrager". An example 
for typical modifications and their influences on the meta data can be seen in 
Fig. 5, where similar parts have been selected from the data base. 

This clustering of parts in the meta data space in order to identify variants 
of designs is a time consuming task when all relevant parts and meta data are 
considered. An alternative method leading to similar results is to merge the meta 
data into a single similarity measure [8]. For the work presented in this paper 
a weighted sum of all the meta data has been employed. Then, if the weights 
are appropriately chosen, parts with the same similarity measure are similar in 
shape. This similarity measure serves as the main attribute for data mining, as 
described in section 3. 
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Fig. 6. Left: idea of hierarchical clustering in a schematic diagram. Two dimensions of 
the similarity space are shown. Right: meta data of real parts. The grey clusters on the 
right correspond to three different variants of the same part. 

e) Evaluation and cleaning of crash result data 

For each crash simulation several values and curves, as well as images and movies 
are computed in order to evaluate the crash worthiness of this particular design. 
The bottle neck here is similar as before: the scripts that calculate the values 
stored in CAE-Bench can be altered at any time, such that the compatibility of 
the values has to be ensured before data mining can be attempted. No automatic 
approach could be developed to check this compatibility so far. In this work 
values whose scripts have been left unchanged for all simulations have been used 
for the DM analysis. This could, however, be a serious drawback of the method 
and other possibilities of ensuring reproducible values for the crash results have 
been discussed with BMW. In this paper the only result values analysed are 
intrusions. Intrusions measure the difference between the distances of two points 
inside the car (one FE node) before and after the crash test. 

In the last step of data preparation the data base is reordered. A table con- 
taining one line per crash test is formed: the name of the model, the similarity 
values of the parts and the result values of interest. 

3 Datamining on similarity data 

The aim of this work is to evaluate the applicability of data mining methods 
for simulation data in engineering. As a result from a complex data preparation 
procedure a table suitable for data mining can be achieved in which simulation 
data appears transformed into geometrical meta data. This table (Fig. 7) is 
written in Weka format, for which readily applicable data mining algorithms are 
available, see [11,12]. 

3.1 Attribute Selection 

An important step in data mining is the selection of those attributes that are 
relevant predictors before starting to build the model [9] . This is important be- 
cause too many may be available when the full data set is encountered. Irrelevant 
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Fig. 7. Re-ordered table for data mining: each line contains one model with similarity 
values of relevant parts and serves as instance for analysis. Simulation results (intru- 
sions) are attached and serve as classes. 



information should be excluded from the data set [10]. Thus a feature selection 
algorithm can show which attributes have the strongest influence on the class. 
For the crash simulation data this information can be particularly valuable, as 
it reduces a vast amount of geometrical modifications to a small number of 
seemingly important ones. 

Employing an attribute selection algorithm on the crash data, e.g. ChiSquared 
in Weka, means that parts are ranked depending on the impact of the variation 
of their similarity measure on the intrusion of interest. In Figure 8 one result of 
such a calculation is shown. The list of parts shows those 6 out of 1200 whose 
variations have most influence on the intrusion. In this case the data basis is 30 
models — the portability therefore is likely to be rather limited. 



Attribute Evaluator (supervised, Class (nominal): 351 value_Some_lntrusion: 
Chi-squared Ranking Filter 

Ranked attributes: 
33.5022 1Sim_11013 
2S.0281S 5Sim_15121 
21.1853 1 48Sim_11032 
21.1853 1 46Sim_11042 
21.1853 147Sim_11041 
21.1853 1 49Sim_11031 

Fig. 8. Attribute selection with Weka: The six parts whose variations have most influ- 
ence on the intrusion. BMW has confirmed the importance of these parts for the front 
crash simulated here. 




H1 A_MOTORTRAEGER VO_1 
G1A_SCHOTTBLECH EINSTIEG Hl_1 
I1A_BUCHSE VO MOTORTRAEGER VO_2 
I1A BUCHSE HI MOTORTRAEGER VO 2 
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3.2 Decision Trees 



Another method employed in order to demonstrate the possible outcome of data 
mining on simulation data is the decision tree method. Here a further step has 
been taken towards the achievement of results relevant to the application engi- 
neer. In practice the engineer is very rarely interested in the behaviour of only 
one of his result values, instead he needs to get an understanding of the influ- 
ences of his design modification on a range on values. For this reason four result 
values were selected and clustered into three groups, one of which covers the 
most desired vehicle behaviour during crash. The clustering of the instances into 
three groups is demonstrated in Figure 9 for two of the result values. A clear 
grouping into " good" (circles), "medium" (squares) and "poor" (triangles) crash 
tests can be seen. 
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Fig. 9. Clustering of result values "Intrusion 1 ... 4" in order to be able to represent 
various aspects of crash behaviour with one single class value. 



The membership of a model to these clusters is then used as " class" when the 
decision tree is built. As attributes the similarity measure — in this case again a 
weighted sum of all meta data — is employed. 

An example for such a tree is shown in Figure 10. The tree thus shows in 
which cluster a carmodel can be expected to lie depending on the geometrical 
version of the parts contained in the carmodel. These represent now the nodes 
of the tree. 

For this example a data basis of 77 crashtests has been employed, which still 
is a rather small basis for rule building. However, these results are promising 
because the similarity measure seems capable of adequately representing shape 
modifications and lead to meaningful results. 



3.3 DM Reports 

The results achieved within the SCAI-DM framework is imported into CAE- 
Bench in order to be accessible by other engineers at other times. The reporting 
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Fig. 10. Decision tree weka. classifiers . trees . J48 -C 0.25 -M 2. The existence of 
the part 11631 "Abstiitzung Lenksaule Unterteil" is the determining factor for the 
intrusions. If this part was integrated (> 0) further important parts in this data basis 
are two modifications of part 13001 "Bodenblech vorne" and part 12162 "Verbindung 
Langstrager" . 

tool in CAE-Bench can include text and figures, such that a data mining report 
can be stored with the underlying input decks in CAE-Bench. This closes the 
circle of the procedure shown in Fig. 1. 

4 Results 

The applicability of data mining on crash simulation data has been demonstrated 
in this work. A framework for data preparation has been developed. The com- 
putation and handling of meta data for similarity search has been studied in 
detail. The employed similarity measure has proved to be appropriate for detec- 
tion of relevant changes in shape. The usability of the approach on data from 
an automotive application has been shown. Due to the limited amount of data 
available for this work conclusions are limited, but first significant results have 
been achieved on a test set of data. The next step aimed at is the integration of 
selected algorithms and data preparation tools into CAE-Bench. As soon as this 
has been accomplished the method needs to be validated on a more substantial 
data set, i.e. within the working environment of BMW. Then it will be feasible 
to judge whether the original questions aimed at can be answered. 
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